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ABSTRACT 

Simulations of galaxy evolution aim to capture our current understanding as well as to make 
predictions for testing by future experiments. Simulations and observations are often com- 
pared in an indirect fashion: physical quantities are estimated from the observational data and 
compared to models. However, many applications can benefit from a more direct approach, 
where the observing process is also simulated, so that the models are seen fully from the ob- 
server's perspective. To facilitate this, we have developed the Millennium Run Observatory 
(MRObs), a theoretical virtual observatory which uses virtual telescopes to 'observe' semi- 
analytic galaxy formation simulations based on the suite of Millennium Run (MR) dark matter 
simulations. The MRObs produces data that can be processed and analyzed using the standard 
observational software packages developed for real observations. At present, we produce im- 
ages in forty filters covering the rest- frame UV to infrared for two stellar population synthesis 
models, for three different models of absorption by the intergalactic medium, and in two cos- 
mologies (WMAP1 and 7). Galaxy distributions for a large number of mock lightcones can 
be 'observed' using models of major ground- and space-based telescopes. The data include 
lightcone catalogues linked to structural properties of galaxies, pre-observation model images, 
mock telescope images, and Source Extractor products that can all be traced back to the higher 
level dark matter, semi-analytic galaxy, and lightcone catalogues available in the Millennium 
database. Here, we describe our methods and announce a first public release of simulated 
observations that emulate the SDSS, CFHT-LS (Wide and Deep), GOODS, GOODS/ERS, 
CANDELS, and the HUDF surveys. The MRObs browser, an online tool, further facilitates 
exploration of the simulated data. We demonstrate the benefits of a direct approach through a 
number of example applications: (1) deep galaxy number counts; (2) observed properties of 
galaxy clusters; (3) structural parameters of galaxies; and (4) identification of drop-out galax- 
ies. The MRObs enhances the range of questions that can be asked of semi-analytic models, 
allowing observers and theorists to work toward each other with virtually complete freedom 
of where to meet. 

Key words: virtual observatory tools - cosmology: theory - cosmology: observations - large- 
scale structure of Universe - galaxies: evolution - galaxies: clusters: general 



1 INTRODUCTION 

Understanding formation and evolution of galaxies is one of the 
main goals of extra-galactic astrophysics. This study is approached 
from two sides, an observational one and a theoretical one. On the 
one hand, observations become more and more detailed, produc- 
ing ever larger images and catalogues that need to be analyzed. 
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On the other hand, theoretical research produces ever more re- 
fined models describing the formation and evolutionary processes 
in ever greater detail, often using sophisticated cosmological com- 
puter simulations that create enormous, physically motivated data 
sets. The increasing specialization and technical sophistication re- 
quired means that it becomes a problem to successfully match these 
two approaches as few scientists are familiar with all the details on 
both the observational and the theoretical side. For example, it is 
often difficult for non-experts to understand detailed galaxy forma- 
tion models or to predict how model parameter changes affect the 
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predictions. Likewise, theorists are often unfamiliar with the ex- 
tensive processing and the inverse methods that need to be applied 
to observations in order to derive physical properties that can be 
matched to the model predictions. 

^From an observational perspective, the Sloan Digital Sky 
Survey (SDSS) consortium played a pivotal role in opening up 
the results of one of the most sophisticated observational programs 
ever performed to the community. Through a public database of 
raw measurements, processed results, and "value-added" products, 
a great many hurdles were removed for using the results of the sur- 
vey. From a theoretical perspective, the Millennium Run Database 
(MRDB) was the first to make the results of large scale cosmo- 
logical simulations widely accessible to a broad-based audience. 
Analogous to the SDSS data, the richness of the theoretical data 
sets available in the MRDB has allowed a wide variety of scientific 
queries to be performed. 

The comparison between cosmological model predictions and 
observations has historically been performed mostly in one direc- 
tion only: physical quantities estimated from observations are com- 
pared with theoretical predictions. The latter are not affected by the 
issues that affect the observations, such as incompleteness, contam- 
ination, cosmic variance, signal-to-noise, or point spread function. 
These are assumed to be corrected fully in processing the observa- 
tions. We propose that comparison between the models and the ob- 
servations should also be performed in the opposite direction. The 
strength of this method lies in the fact that one can never be sure 
to extract the truth out of observations, but one will always know 
what the true answer is in a set of synthetic observations based 
on the simulations. In this paper we present an extension of the 
Millennium Run cosmological simulations project, which we will 
henceforth refer to as the Millennium Run Observatory (MRObs). 
It aims to bridge the gap between the two approaches by making 
the final step from realistic simulations to the observational plane. 
MRObs consists of a fully connected set of synthetic data prod- 
ucts combined into a unique online framework that ranges from the 
most fundamental simulations to realistic, synthetic observations. 

With the introduction of 'lightcones', the comparison be- 
tween simulations and observations has been greatly enhanced. 
This technique allows one to project the galaxy distribution pre- 
dicted for a set of discrete simulation snapshots along a virtual 
observer's line of sight, mimicking the main g eometric and pho- 
tometric effects present in deep galaxy surve ys (Davis et al. 1982, 
1985[ : lDiaferio et all 1 9991 : iBlaizot et al.l2005l : lKitzbichler & White] 
20071) . However, even the lightcone approach to model-data com- 
parisons is still very much idealized. To illustrate this, let us con- 
sider a typical observational scenario of determining the stellar 
mass function of high redshift galaxies in a multi- wavelength imag- 
ing survey. Such an analysis typically begins with the extraction of 
sources and their photometric properties across a set of calibrated 
and registered filter images. Due to missed light, it is often neces- 
sary to make corrections to the measured magnitudes. Then, pho- 
tometric redshifts are estimated by fitting the photometry with a set 
of template spectra. After this step (or simultaneously) physical pa- 
rameters of the galaxies such as stellar masses, ages, or SFRs are 
estimated, again often using a set of template galaxy spectra. It is 
important to note that the results often depend on, e.g., the source 
detection and photometry method, the choice of template spectra, 
and the fitting method. In order to calculate the number of galaxies 
detected in different stellar mass bins over different redshift inter- 
vals, it is often required to calculate the "effective volume" of the 
survey. The latter is an estimate of the completeness of the sample, 
and usually depends on redshift, limiting magnitude, galaxy color 



or size in complicated ways. This last step can be performed by 
estimating the probability of recovering certain sources at a given 
survey depth. Such estimates typically depend on the true source 
population which is a priori unknown. At the end of the process, 
the stellar mass function estimate is used for comparison with other 
observational studies, or to constrain certain theoretical models or 
simulations of galaxy formation. It should be clear from the process 
outlined above that a great number of non-trivial steps need to be 
performed before any comparison with theory can be made. How 
better could we test all these steps than by processing the output 
from the simulations, for which all quantities are exactly known, 
through the same kind of data analysis pipeline as the real observa- 
tions? 



1.1 Goals of the MRObs 

We will take the process of simulating the galaxy population for 
comparison with observations into largely unexplored territory by 
simulating the observational process applied to the Millennium 
Simulations. The main aims of the MRObs are as follows: 

• Extend the Millennium Run project approach by producing data 
products most directly corresponding to observations, namely 
synthetic images and extracted source catalogs 

• Aid theorists in testing analytical models to observations 

• Aid observers in making detailed predictions for observations 
and better analyses of observational data 

• Allow the community to subject the models to new kinds of tests 

• Allow observers and theorists to work toward each other from 
either direction with the freedom of where to meet 

• Allow detailed comparisons with synthetic observations pro- 
duced by other groups performing cosmological simulations 

• Allow calibration of observational analysis methods by making 
available synthetic data for which the entire underlying "reality" is 
known 

• Extend the realism with which semi-analytic models can address 
questions such as what is the probability that a z ~ 10 galaxy will 
be detected within a particular observational data set? 

• Provide a framework for future virtual theoretical observatories 



1.2 Connection to previous work 

Only recently have simulations become sophisticated enough to 
allow realistic visualizations of the galaxy population on a cos- 
mological scale. In order to illustrate the particular place that 
the MRObs occupies within this simulations landscape, we give 
a short overview of related work in the literature. Astronomical 
image simulation software has been developed and used previ- 
ously, mostly to aid in the development of data processing pipelines 
for new telescopes and instruments, for proposal planning, or for 
testing the accuracy of specific measurement tools (e.g. iBertinl 
120091 :1 Dobke et aD l201Qh . Within the gravitational lensing com- 
munity, it has been standard practice to use simulated data to 
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asses s the accuracy of cosmic shear measurements ferben et all 
l200ll : iHevmans et all I200A | Forero-Romero et al.l 120071) . Simple 
galaxy evolution models have been coupled to image simulator s 
to compare with observations (e.g. iBouwens et alTll999l . 120061) . 
and mock telescope data based on semi-analytic models (SAMs) 
are also currently being used to investigate the significant data 
and science challenges posed by future s urveys (e.g., with the 
Large Synoptic Su rvey Telescope (LSST); fConnollv et al.ll2O10l ; 
iGibson et al.ll201ll) . The detailed morphological and kinematical 
structures of gas and stars have been modeled using high resolu- 
tion, hydrodynamical simulations (of dark matter, gas and stars), 
coupled with radiative transfer models that allow one to study the 
effects of dust and orientation as a function of wavelength (e.g. 
Jonsson et all200d.l20ldlRobertson & Bullockll2008l : IWuvts et all 
2009l : lLotz et al.ll2008l . l201ol) . However, hydro simulations of suf- 
ficient resolution are currently too small to construct lightcones on 
cosmological scales. Also, unlike SAMs, it is a much more time- 
consuming process to match Af-body hydro simulations to observa- 
tions after each change in the sub-grid physics modeling. As a result 
current hydro simulations of the galaxy population are substantially 
furt her from the observat ions than semi-analytical models. 

Blai zot et all (2005) pioneered in the production of realistic 
artificial telescope data based on lightcones extracted from their 
semi-analytic model. That paper already laid out most of the work- 
flow that we use here (see Fig.Q}: dark matter particle simulations 
are used to construct halo merger trees on which a semi-analytic 
model is run. The output from the SAM is used to construct galaxy 
lightcones that are used as input for artificial telescope image simu- 
lations. Galaxies are extracted from the ar tificial images using stan - 
dard observational tools (e.g. S Extractor ; lBertin & Arnoutsll 19961) . 
and the resulting galaxy catalogs are compared to the original sim- 
ulations at different leve ls, or to actual observ ations. Unfortunately, 
however, the methods of lBlaizot et al.l d2QQ5h were never employed 
on a large scale, and in subsequent years the comparison between 
SAMs and real observations has been mostly performed at the light- 
cone level or even at the snapshot level, thereby sidestepping many 
of the details involved in analyzing real telescope data that ob- 
servers typically have to go through. As we shall show, however, 
numerous problems in the field of galaxy evolution could benefit 
from a simulation that accounts for the entire observational pro- 
cess. This leads to new insights involving details that are missed 
by higher-level comparisons between data and si mulations. By ex- 
panding on the basic ideas of lBlaizot et al.l (120051) . the MRObs aims 
at making this possible. 

1.3 Why the Millennium Simulations? 

Although in this paper we lay out the motivation and framework 
for producing synthetic data from cosmological simulations in gen- 
eral, the MRObs is based around the suite of MR simulations. 
Through the combination of simulations volume and particle res- 
olution, an active development of semi-analytic models, and an on- 
line database providing access to numerous data products, the MR 
is ideally suited for most of our purposes, as follows. 

(1) Volume and resolution: The MR has an almost ideal com- 
bination of volume and particle mass resolution suitable for a wide 
range of applications. The resolutiorQ is sufficient to identify the 
> 5 x 10 10 Mq halos believed to host faint galaxies at very high 



Full convergence between the MR and the much higher resolution MR-II 
simulation is near 10 11 Mq. 



redshifts dOuchi et al.ll2005l : lOverzier et al.ll2006l) . while probing 
significantly down the stellar mass function with good statistics at 
lower redshifts. The volume is large enough to probe a very wide 
range of environments. The MR contains about 3,000 cluster- sized 
objects at z = 0, of which about 25 are of the Coma- type (i.e., more 
massive than 10 15 Mq). The formation of all these systems can be 
traced b ack to very high reds hift for detailed studies of cluster for- 
mation (lOverzier et all 2009). The large volume is also crucial for 
constructing synthetic galaxy surveys covering many square de- 
grees without significant replications ([K itzbichler & White 2007; 
lGuo& White 2009; Overzier et al. 1 2009: Henriques et al. 20 13). 

More recent dark matter simulations have been performed. 
The MultiDark simulations span an 8x larger volu me but with 
a 10 x lower mass resolution compared to the MR dPrada et all 
1201 lh . The Bolshoi simul ations have a 10 x higher mass resolu- 
tion, but are 8x smaller (Klv pin etal.ll201ll) . Neither simulation 
has yet released semi-analytic galaxy catalogs that can be used 
to compare with actual observations. The somewhat limited mass 
resolution of the MR has recently been ex tended by two orders 
of ma gnitude through the MR-II simulation teovlan-Kolchin et al.l 
l2009h . This simulation is extremely useful for further improving the 
semi- analyt ic model that can then be re- applied to the original MR 
simulation jGuo et ap201ll) . The somewhat limited volume of the 
MR has also recently been extended by two orders of magnitude 
throu gh the Millennium XXL (MXXL) simulation jAngulo et al.l 
120121) . useful for studies of the rarest, most massive objects. How- 
ever, for the generation of mock lightcones, the MR is currently 
still our preferred simulation (125 x larger volume compared to the 
MR-II and 7x higher resolution compared to the MXXL). 

Recently, it has become possible to re-cast the suite of 
MR simulation results in more updated cosmo logies relative to 
WMA P1 thanks to the re-scaling technique of lAngulo & White] 
bOld see 32TD- 

(2) Semi-analytic models: As we will show, the iGuo et all 

(201 1|) semi-analytic model applied to the MR is key to producing 
our synthetic observations. This model gives detailed predictions 
for the evolving sizes and spin axes of the stellar mass in disks 
and/or bulges that are crucial for calculating angular sizes, bulge- 
to-disk ratios, inclinations and position angles. 

(3) Millennium Run Database: The dark matter and galaxy 
catalogs of the MR project and related simulations have been 
made widely accessible to the com munity through the MRDB 
dLemson & Virgo Consortium! 120061) . Interested users can query 
the data in this database through various online services using stan- 
dard Structured Query Language (SQL). Regular updates to the 
MRDB holdings provide public access to the latest model results, 
ensuring that anyone can analyze the MR data and use its results 
in their publications. We have now added to this system our syn- 
thetic imaging data and extracted source catalogs that can be cross- 
correlated with the underlying simulations data in th e MRDB. 

In summary, despite the age of the original MR dSpringel et al.l 

120051) . more recent dark matter simulations do not yet provide 
equivalent data sets or the infrastructure required for developing 
a facility such as the MRObs. 

1.4 This paper 

In this paper, the first in a series comparing theory and observa- 
tions in the observational plane, we lay a framework for producing 
synthetic data from cosmological simulations, describe our main 
methods for future reference, present a number of user examples, 
and announce the public release of a large number of simulated 
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surveys (synthetic images and catalogs). We also present various 
new online services that allow one to interact with the synthetic ob- 
servations and the underlying lightcones, semi- analytic galaxy and 
dark matter catalogs in the MRDB. The structure of this paper is 
as follows. In §2 we will present a concise overview of the MRObs 
and describe in detail all the steps that are needed in order to go 
from a pure dark matter simulation and semi-analytic galaxy cata- 
log to producing realistic synthetic observations. In §3 we present 
a detailed simulations example focusing on our synthetic images 
produced for the on-going CANDELS HST program. In §4 we il- 
lustrate the new types of questions that can be asked of the MRObs 
through a number of examples related to galaxy and galaxy cluster 
evolution. In §5 we present the public data release and the interac- 
tive online tools we have developed, and we summarise in §6. 



2 STRUCTURE OF THE MILLENNIUM RUN 
OBSERVATORY 

The MRObs makes available a fully interconnected set of data 
products covering the entire chain from dark matter simulations 
to synthetic observations and extracted data. In the MRObs, each 
subsequent step uses data products produced by previous steps, 
and almost all the data products are available for interrogation 
and public download for further analysis. A schematic overview 
of this process is given in the workflow diagram in Fig. [TJ where 
rectangles indicate an action and tilted rectangles represent data 
products that in each step can be linked to products elsewhere 
along the chain. The 8 main steps are: 

1 . Dark matter particle simulation (DM density fields) 

2. Identifying of friends-of-friends (FOF) groups 

3. Identifying (sub-)halos 

4. Constructing halo merger trees 

5. Applying semi-analytic galaxy models 

6. Observing galaxies on a synthetic light-cone 

7. Producing synthetic telescope images 

8. Extracting sources from synthetic images 

In this section we will describe each of the steps in more detail, 
focusing on the newly developed components that are most essen- 
tial to bridge the gap to real observations (steps 6-8), and refer to 
other work for the components described in detail elsewhere (steps 
1-5). 

2.1 The Millennium Suite of Dark Matter Simulations 

The evolution of the dark matter distribution with time is 
believed to be mainly driven by the initial matter power spec- 
trum, gravity, and the expansion rate of the universe, and 
can be taken e it her from direct iV-body simula t ions (e.g. 
iDavis et all 1 19851 : Ijenkins et all Il998l : ISpringel et all l2005h . or 



from (semi-)analytically constru ct ed dark matter hal o trees 



(e.g. Press & SchechteJ 


19741: Kauffmar 


m&Wh 


itd 1 19931: 


lLacev & Cold 1994 : [Somerville & Kolad 


19991: 


Sheth et all 



l200ll : iNeistein & Dekell l2008h . In the suite of cosmological 
simulations centered around the Millennium Run project, the dark 
matter simulation was performe d with versions of the cosmo- 
logical simulation code Gadget dSpringel et al.l l2005h . The suite 
of simulations consist of (1) a 2160 3 particles simulation with 
particle mass 8.6 x 10 8 /i -1 M and p eriodic box length o f 500 
h' 1 Mpc (the Millennium Run fMR): ISpringel et al.ll2005h . (2) 
a 2160 3 particles simulation with mass 6.9 x 10 6 h- 1 M Q and 
periodic box length of 100 h~ x Mpc (the Millennium-II (MS -II); 
Boyl an-Kolchin et al] l2009h . and (3) a 6720 3 particles simulation 
with mass 6.2 x 10 9 h~ x M© and peri odic box length of 3 
Gpc (the Millennium-XXL (MXXL); lAngulo et alJ l2012h . All 
simulations follow the gravitational growth as traced by these 
particles from z = 127 to in a ACDM cosmology (Q m = 0.25, 
— 0.75, h — 0.73, n = 1, cr 8 = 0.9) most consistent with 
the Wilkinson Micro wave Anisotropy Probe (WMAP) year 1 data 
dSpergel etalJl2003h . The dark matter particle distributions were 
stored at 64 discrete epochs ("snapshots"). 

2.1.1 Scaling of Cosmological Parameters 

The suite of MR simulations were performed using the now disfa- 
vored WMAP1 cosmology. While the lower value of as preferred 
by the more recent WMAP7 data will cause the growth of dark 
matter structure to be delayed with respect to a WMAP1 cosmol- 
ogy, its effect on galaxy formation models is less straightforward 
to infer. Running simulations with multiple cosmologies is a time- 
consuming process. Instead, the MRObs proje ct uses a recent tech- 
nique introduced by lAngulo & White] bOlOh in which the output 
from a cosmological Af-body simulation in one cosmology (e.g., 
WMAP1) can be scaled to represent the growth of structure in an- 
other cosmology (e.g., WMAP7). Tests comparing direct Af-body 
simulations done in two cosmologies with a simulation that was 
scaled from one to another cosmology show that power spectra are 
reproduced to better than 3% at all scales. In the MRObs the tech- 
nique is applied to halo catalogues. Properties such as mass, con- 
centration, velocity dispersion an d spin are scaled are re produced 
at about the 10% l evel or bet t er \ Angulo & Whitdl201(j see also 
iRuiz et all feOlllV ). iGuo et all <2012h give the properties of semi- 
analytic galaxies in the MR and MR-II scaled to the WMAP7 cos- 
mology. 

2.2 Dark matter halos 

The Millennium simulations output the dark matter phase- space 
distribution at 61 different epochs at z < 60. The spacing be- 
tween these outputs is roughly equal in the log of the expansion 
factor, specifically, ^300 Myr for z < 2 and ^100 Myr for z > 6. 
In each of these snapshots , DM haloes are found using a FoF al- 
gorithm dPavis ct alJll985h with a linking length parameter equal 
to one fifth of the mean inter-particle separation. Within each FoF 
halo, sel f-bound substructure s are identified using the SubFind al- 
gorithm (ISpringel et ap200lh . 

For each subhalo, at each output time, a unique descendant in 
subsequent snapshots is assigned as the subhalo which contains the 
majority of the most bound particles (slightly different definitions 
have been used among the different Millennium simulations). Fi- 
nally, these pointers are arranged in a tree-like data structure which 
allows to access the full mass evolution of a given object across 
time. This structure - a merger tree - represents the backbone and 
starting point for our post-processing simulations of galaxy forma- 
tion. 
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Figure 1. Schematic overview of the Millennium Run Observatory workflow. The blue rectangles indicate an action, while the red tilted rectangles represent 
data products that in each step can be linked to products elsewhere along the chain. Thick arrows indicate that there are direct links between data products, 
while thin arrows indicate that indirect links can be made using cross-correlation. Dashed lines link products to actions from which they result, or by which 
they are used. Shaded rectangles indicate products or actions that have been updated or are introduced in this paper for the first time. The workflow starts with 
an iV-body dark-matter-only simulation (see 32. IV Dark matter particles are grouped together using a friends-of-friends group finder and decomposed into 
halos and sub-halos using a halo-finder algorithm (see $\2.2l . This results in positions, velocities, spin vectors and masses of dark matter halos in an evolving 
ACDM universe. A dark matter halo merger tree is constructed and stored in a data base. Optionally, a scaling of the cosmological parameters can be applied 
to the halo merger tree (see 32.1.U . The merger tree forms the backbone for a semi-analytic model of galaxy formation that tracks the growth of galaxies 
inside halos based on simple recipes for, e.g., gas cooling, star formation, supernova and AGN heating, gas stripping, and merging between galaxies (see 32.3V 
In each time step or snapshot, the resulting physical properties of each galaxy in the semi-analytic galaxy population are used to select appropriate stellar 
population templates from a library of spectral energy distributions to model the rest-frame, dust-attenuated spectra or colors of each galaxy (see 32.3.2V A 
pencil beam-shaped "lightcone" is carved out through the simulation volume using a modified version of the code MoMaF, selecting only galaxies from those 
snapshots that correspond to the cosmic time at the co-moving distances along the line-of-sight in the observer's frame of reference (see 32.4V Multi-band 
apparent magnitudes are calculated and corrected for absorption by neutral hydrogen in the inter-galactic medium (see 32.5V This lightcone is then projected 
onto a plane giving virtual sky positions for each galaxy in terms of right ascension and declination. The positions, shapes, sizes and observed-frame apparent 
magnitudes are used to build a "perfect" pre-observation image of the sky using a modified version of SkyMaker (see 32.6V The perfect image is fed into the 
telescope simulator that applies a detector model (pixel scale, readout noise, dark current, sensitivity, gain), a sky background model, PSF convolution, and 
Poissonian object and sky noise for a particular survey description (see 32.6.3V The MRObs produces a realistic, synthetic telescope image in .fits format for 
further scientific analysis. Source Extractor is run on the simulated image and the output catalogs can be analyzed analogous to the catalogs constructed from 
real observations (see 32.7V 



2.3 Synthetic galaxy catalogues 

2. 3. 1 Semi-analytical galaxy formation models 

The N-body simulations used in the MRObs follow dark matter 
particles only. To add predictions about the baryonic content of 



ferred to as semi- analytical modelling (SAM) (e.g. White & Frenk 


1991 


: iKauffmann et al. 1993 


:IColeetal. 1994: iKauffman 


n et al. 


1999 


; 1 Somerville & Primackl 


19991: IKauffmann & Haehnel 


|2000|; 


Somerville et al.l 200 ll: ISDrinsel et al.l 


200ll: lHatton et al. 


|2003|: 


KansetalJ 120051: be Lucia & Blaizot 


20071: lGuoetal.1 12011: 


Somerville et al .11 20 11). Usin 


g simplified descriptions ("recipes") 



for the baryonic physics, these models follow the evolution of the 



galaxies within the skeleton provided by dark matter halo merg- 
ing trees defined in the previous steps. These recipes include gas 
cooling, star formation, reionization heating, supernova feedback, 
mergers, black hole growth, metal enrichment and feedback from 
active galactic nuclei. The recipes are constrained by local obser- 
vations and by physical insight. 



This technique is much less computationally expensive than 
adding full hydrodynamics to the basic simulations. Once the back- 
bone formed by the dark matter structure has been established, the 
semi-analytic modeling of the galaxies can be repeated many times 
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in order to find the recipes and parameters that are required to match 
the observation^]. 

The Millennium Run Database (MRDB; 
llxmson & Virgo Consortium! l2Q06h contains galaxy catalogues 
from two SAMs, L-Galaxies, created at the Max-Planck- 

Institute for Astrophysics in Muni ch ([springel et al. 2001 ; 

Croton et al.ll2006l : be Lucia & Blaizotll2007l : Bertone et al.ll200l 
Guo et al. 1201 lh. and GalForm, created by the University of 



Durham dCole et alJl200d : iBenson et all 120031 : iBaush et al J [20051 : 
iBower et al.l l2006h Compared to earlier models, also stored in 
the M RObs, the latest version of the Munich model by iGuo et all 
(201 if) that we focus on here, includes improved prescriptions 
for supernova fe edback, gas s t ripping , galaxy merging, and bulge 
formation (see ICroton et all )2006|: be Lucia & Blaizotl 12007 ' 

iHenri 



iBertone et all l200l IGuo et all l201ll : iHenriques et al.1 
successive versions of the Munich model applied to the MR). The 
output of the SAM is stored for each of the 64 snapshots, thus 
sampling the evolution of the galaxy population every few 100 
Myr. The SAM calculations, however, are computed on a finer grid 
consisting of 20 steps of about 10 Myr each between each pair of 
snapshots. This ensures that the properties of galaxies are modeled 
on time- scales appropriate for a wide range of star formation 
histories, including brief bursts of star formation that may happen 
in between snapshots. 

The galaxies resulting from the semi-analytic model naturally 
span a wide variety in star formation histories (SFHs), correspond- 
ing to the different gas accretion and merger histories of individual 
galaxies. The relational database of the MRObs allows us to recon- 
struct these SFHs in great detail. It is important to keep in mind 
the distinction between the SFH of the galaxy that forms the main 
branch in a galaxy merger tree, and that of the stars in all the pro- 
genitors of a descendant id entified at some snapshot. As shown by 
iDe Lucia & Blaizotl d2Q07h this typically results in large differences 
between the time it took for the stellar mass to be formed ('forma- 
tion time') and the time it took for that mass to assemble into a 
single galaxy ('assembly time'). An example is shown in Fig. [2] 
showing the stellar populations of all the different branches that 
form the merger tree of a single galaxy at z = 0. When observers 
study the star formation history of a particular galaxy selected at 
some redshift, they do thus not necessarily study the SFH of a sin- 
gle galaxy, but rather the SFH of all its progenitors (weighted by 
stellar mass). 

Similar to real galaxies, galaxies in the MRObs span a very 
large range in SFHs. In Fig. [3] we show the average SFHs for star- 
forming and quiescent galaxies in the MRObs. These SFHs were 
determined by summing the SFRs of all the progenitors of 100 
galaxies selected at z « 2. For systems having SFRs of >10 M© 
yr _1 and M* ~ 10 10 M© (e.g., similar to Lyman Break Galaxies, 
LBGs), the SFHs are rising (blue line in Fig. [3, roug hly as de- 
rived from observations of LBGs JPapovich et aljEoil ). For sys- 
tems having SFRs of <10 M yr" 1 and M* - 10 11 M (e.g., 
similar to Distant Red Galaxies, DRGs), the average SFH rapidly 



2 Some proponents of cosmological hydrosimulations as well as observers 
claim that semi-analytic models do not predict anything because they are 
'tuned to fit the data' . This argument, however, does not make much sense. 
The goal is to try to understand the formation and evolution of galaxies. 
Whether our current understanding is cast in the sub-grid physics of hy- 
drosimulations, in the parameters and recipes of semi-analytic models, in 
the interpretations given to observations in the literature, or in the formulae 
printed in our cosmology textbooks does not matter. All these efforts lead 
to new 'predictions' that need to be tested. 
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Figure 2. The merger history of a single galaxy selected from the MR. The 
dark matter halo properties stored in a data base are used as the backbone 
for a semi-analytic model of galaxy formation that tracks the growth of 
galaxies inside halos based on simple recipes for, e.g., gas cooling, star for- 
mation, supernova and AGN heating, gas stripping, and merging between 
galaxies. In each time step or snapshot of the simulation, the resulting phys- 
ical properties of each galaxy in the semi-analytic galaxy are used to select 
appropriate stellar population templates from a library of spectral energy 
distributions to model the rest-frame spectra or colors of each galaxy. In the 
example shown here, the color coding indicates the rest-frame (g* — t')ab 
color of all the galaxies that are part of the merger tree of a single galaxy 
selected in the simulation snapshot 63 (z = 0), starting from 10 (z « 12). 
The other two axes show the 2D positions of these galaxies in the simula- 
tions volume. 



declines after z ~ 5 (red line) analog ous to the best-fit SFHs of 
DRGs observed (e.g. lKriek etaLll2006 



2.3.2 Multi-Wavelength Model Predictions 

SAMs predict physical properties of galaxies, such as their stel- 
lar masses, ages, metallicities, and gas content. One common way 
of testing the models is to compare them to the same physical 
properties derived from the SEDs of observed galaxies. At the 
least, this approach depends on having well-established measur- 
ing techniques an d accurate stellar population synthesis models 
(see lTinslevlll980h . In practice, this sort of analysis typically in- 
cludes numerous assumptions, and certain features of the galaxies 
can never be recovered from the observations in full (e.g., their ex- 
act star formation history or dust attenuation). In the MRObs the 
application of stellar population synthesis models and dust recipes 
allow one to make detailed spectro-photometric predictions for the 
model galaxies by adding up synthetic spectra corresponding to the 
different generations of stars that these galaxies consist of at any 
moment. The great predictive power of SAMs in terms of the ob- 
servable, photometric properties of galaxies is in large part based 
on the spectral synthesis modeling of the stellar populations being 
formed in the semi-analytic model galaxies according to their SFRs 
at any given time (see Fig.|2]for an example of a typical galaxy). 

The predicted multi-wavelength properties of galaxies de- 
pend on the spectral synthesis model used. These models are cur- 
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Figure 3. The average SFHs of star forming and quiescent galaxies iden- 
tified in the SAMs at z « 2. The LBG-like systems (bl ue line) show a 
rising SFH, analogous to that derived from observations dPapovich et al.l 
l201ll dashed line). DRG-like systems have SFHs that sharply decline after 
z « 5. 



rently still affec ted by gaps in our understanding of stellar evo- 
lution (e.g., see IConrov et al.ll2009L preventing us from making 
unambiguous predictions for the main galaxy o bservables. For ex- 
ample , two well know n synthesis mode ls (by iBruzual & Charlotl 
(I2003L "BC03") and bv iMarastonl (I2005L "M05")) give conflicting 
observational predictions for certain galaxy populations, particu- 
larly where thermally-pulsating asymptotic giant branch stars (TP- 
AGB) influe nce the rest-frame near-IR emission of g alaxy pop- 
ulations (e.g. iTonini et aDl2010l : Irlenriques et aljEoilL and refer- 
ences therein). The impact of these different models implemented 
in the MRObs are illustrated in Fig. [4] where we show the observed 
K-A.5nm vs. I-K color-color diagram for galaxies at z ~ 2 in 
our simulations. The M05 model shown right predicts significantly 
redder K-4.5/nm colors compared to the BC03 model shown left, 
especially for galaxies between 1 and 2 Gyr in age. In contrast, 
the I-K colors are consistent between the two models. A simi- 
lar plot based on SEP model curves and real data was shown in 
iMaraston et alj (|2006, figure 2 in that paper). 

The uncertainty surrounding these stellar population models 
also affects our ability to derive physical quantities from obser- 
vations. Observers rely in large part on fitting spectral templates 
to the data to obtain, e.g., photometric redshifts, stellar masses 
and mass functions, star formation rates and ages, that are all es- 
sential for constraining the evolution of galaxies across time. In 
order to aid the community in performing the best comparisons 
with observations, MRObs therefore provides mock observer-frame 
data in a great many of filters and using different spectral syn- 
thesis models. We use the multi- wavelength filter catalogs pro- 
duced by the semi-analytical mode ls ran using both the BC03 an d 
M05 spectral synthesis models by Irlenriques et alj j201lL l2012h . 
We furthermore model the effect of dust on the predicted colors 
and magnitudes using the d ust treatment rec ipe first introduced b y 
Kitzbichler & White! j2007h and adopted by iGuo & White] (120091) : 
Guo et al.l (l201lh : lHenriaues et all d2012h . The highly modular ap- 
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Figure 4. The choice of stellar population synthesis model affects the color 
distributions of galaxies. We illustrate this by showing the optical-infrared 
color-color diagrams for galaxies at z ~ 1.9 s elected from the lightcones 
modeled using B C03 dBruzual & Charlotl I2003L left panel) and using M05 
(Maraston 2005, right panel). Galaxies are color-co ded according to their 
mass-weighted age (see legend on the right). See also lMaraston et ail (2006. 
figure 2 in that paper). The MRObs offers the choice between different spec- 
tral synthesis models. 
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Figure 5. Examples of filter sets currently available in the Millennium Run 
Observatory: space-based UV (GALEX, HST WFC3-UVIS), ground-based 
optical (Johnson, SDSS, VIMOS), space-based optical (HST WFPC2, 
ACS), ground-based near-IR (Johnson, VISTA), space-based near-IR (HST 
NICMOS, WFC3-IR) and mid-IR (Spitzer/IRAC). Typical model galaxy 
spectra at z = 0, z = 1, z = 2, and z = 4 are shown for reference (grey 
curves). 



proach of the MRObs (see Fig. [TJ makes it straightforward to add 
alternative or improved models in the future. 



We currently provide magnitudes in 40 bands covering the 
FUV to the mid-IR as observed by major telescopes and instru- 
ments (Table [TJ. The filter bandpasses together with the spectra of 
typical galaxies at z = — 4 are illustrated in Fig. [5] 
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Figure 6. The construction of the lightcones. Left panel: The lightcone is constructed by replicating the simulation box until the comoving distance corre- 
sponding to the desired limiting redshift is reached. In this example, the original co-moving size of the MR simulation is extended from 500 Mpc/h to ~7000 
Mpc/h, corresponding to z « 10 for h = 0.73. A conical volume is carved out from the volume that has now been expanded through the box replication 
process, and galaxies are selected from the overlap between the cone and the replicated volume. In order to model the relation between co-moving distance 
and redshift, at any point along the cone galaxies are selected only from that snapshot that is closest in redshift to the one corresponding to the co-moving 
distance along the line-of-sight. We do not interpolate over physical properties of the galaxies (they are assumed to be relatively constant between two con- 
secutive snapshots), but apparent magnitudes and colors are interpolated, to make sure that galaxies have the right values for their redshifts. Middle panel: 
The starting position and orientation of the lightcone through the simulation box can be chosen such that the entire replicated volume can be constructed out 
of conical segments (drawn in blue and yellow) drawn from the original volume without passing through any region twice. Different 'views' of the simulated 
universe can be created by changing the starting points or orientations of the cones. Narrow pencil beams can be constructed out to very high redshift without 
replications, while very wide-field surveys can be made by keeping the limiting redshift low. Much larger volume surveys can be generated if the scientific 
application of interest allows some degree of replication. Right panel: Multiple (semi-)independent lightcones can be extracted from the simulations box by 
choosing different star ting positions (the position of the observer, z = 0) and orientations (specified by the two angles 6, </>). The 24 "field" lightcones from 
iHenriques et all 1201 2l) each with an opening angle of 1.4° x 1.4° are indicated. 



2.4 Lightcone Construction 

The snapshots of data (in time or in redshift) that are produced by 
numerical simulations present an idealized view of the evolving 
universe that is different from data resulting from observations of 
the extra-galactic sky. In order to allow for more realistic and direct 
comparisons between the model predictions and observations, we 
construct so-called "lightcones" in which galaxies that were sim- 
ulated at discrete snapshots are re-arranged in order to mimic the 
relation between the distance along an observer's line of sight and 
cosmic time as accurately as possible. 

In this paper we use lightcones introduced in IHenriques et al.1 
to which we add structural properties, and a set of new 
lightcones pointed at specific objects. These lightcones are built 
using a vers i on of the Mock Map Facility (MoMaF) code of 
iBlaizot et all <2005l) . The lightcone techniqu e has been described 
in de t ail in the original paper (also see, e.g.. [Kitzbichler & White ! 
2007; iGuo & Whit el 120091 : lOverzier et all 120091 : IHenriques et all 



2012). Because of its importance to the MRObs, here we give a 



short review of the technique, and describe a use of MoMaF that 
allows us to create lightcones aimed at specific objects of interest 
in the simulations and which is specially developed for the MRObs. 



2. 4. 1 Review of lightcone methods 

The MR predicts the detailed properties of the dark matter and the 
galaxies it contains for a closely spaced set of snapshots that are 
sufficient to compare with observations from z = to the high- 
est redshifts currently observed. In principle the simulations box 
probes a sufficiently large volume to construct large pencil beam 
surveys. For example, the total simulations volume of (500 /i -1 
Mpc) 3 is equivalent to that probed in a pencil beam survey out to 
z — 10 and measuring 4 square degrees on the sky. On the other 



hand, the comoving distance to z — 10 of ^7,000 hT 1 Mpc is 
much larger than the side of the simulations box of 500 hT 1 Mpc 
(900 h~ x Mpc when taking the diagonal through the box). 

Blaizot et al ] d2QQ5h solved this problem by 'replicating' the 
simulations box along an artificial observer's line of sight until the 
maximum comoving distance desired is reached, and then extract- 
ing a conical pencil beam out of the enlarged volume. They explain 
that care must be taken to avoid "perspective effects" caused by 
replicati on of the same part of the universe in certain directions. 
Whereas felaizot et all J2005h solve this by adding random rotations 
and translations of the b oxes, thereby introducing d iscontinuities 
in the galaxy distribution, IKitzbi chler & White] (120071) showed that 
for certain orientations of the lightcones through the MR box and 
for a small enough opening angle of the cone, the lightcone can be 
constructed without passing through any region of the simulations 
twice (or at least ensuring that copies are widely separated in red- 
shift if replication occurs). It is the latter method that we use for all 
pencil-beam light cones in the MRObs. 

We illustrate the box-replication process in Fig.[6] In the panel 
on the left, a virtual lightcone is drawn in a much enlarged MR 
volume constructed using the box replication method. The opening 
angle of the cone, its origin and angles of intersect with the original 
MR box are chosen such that every cone segment (indicated by 
the blue-yellow segments) can be extracted from the original MR 
volume in such a way as to almost cover the complete simulation 
volume and without passing through any region of the box twice (as 
illustrated in the middle panel). Many (semi-)independent pencil 
beam surveys can be constructed from the MR by changing the 
angles or the origin of the cone (right panel). 

Besides these geometric considerations, one must take spe- 
cial care that each galaxy is seen at the evolutionary phase and 
with the photometric properties corresponding to its redshift along 
the lightcone. In the MR, snapshots are separated by ^100-400 
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Figure 7. Lightcone construction further explained. Left panel: The lightcone in the expanded co-moving coordinate frame. Middle panel: Projection of the 
lightcone onto a virtual celestial sphere. Right panel: Galaxies in the lightcone as seen projected on the sky. The color bar on the right illustrates which particular 
snapshot was used to populate each of the different sections along the lightcone. For clarity, we only plot the lightcones out to z « 0.3 (snapnum=52). In 
reality our lightcones extend to beyond z = 10 following the same procedure. 
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Figure 8. Detail of a randomly pointed lightcone measuring 1.4° x 1.4° on the sky in the redshift versus declination plane. Galaxies plotted have SDSS 
z ; -band magnitudes of < 26. 5 (AB). 
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Figure 9. The effects of galaxy peculiar velocities on the apparent redshifts 
of galaxies in the lightcone. The top panel shows declination vs. the geo- 
metric or cosmological redshift for galaxies in and near a massive galaxy 
cluster at z w 0.4. The bottom panel shows the redshift- space distortions 
an observer would see due to the peculiar velocities of galaxies moving 
through the gravitational potential well of the cluster. 



moving distance (or redshift) from the origin. This new technique 
enables us to model observations toward specific objects or regions 
by choosing a location within the simulations volume at one red- 
shift and 'observing' it within a lightcone with origin at another 
redshift. Obvious uses of this technique are to study the appear- 
ance of a particular galaxy cluster selected at z = and observed 
at z — 1, or to study the z = descendant of a halo (or galaxy) 
selected at z = 6. It is important to take into account the peculiar 
velocities of galaxies when constructing the lightcones as they can 
heavily distort the observed redshift distributions, especially in the 
vicinity of massive objects such as galaxy clusters (see Fig. [9}. 

The Millennium Run database allows one great flexibility in 
selecting such targets, and even allows one to the define the com- 
plete geometry of the light cone in a single SQL query. Our new 
lightcone aiming technique thus greatly enhances the application 
of the MR to numerous new problems. Examples related to galaxy 
clusters will be shown in Sect. 14.21 



Myr, meaning that the evolving galaxy population is sampled at 
fairly frequent intervals out to very high redshifts. Fig. [7] illus- 
trates how we make use of these snapshots to obtain an evolving 
galaxy population as a function of co-moving distance (or redshift) 
along the lightcone. Each section consists of those galaxies hav- 
ing redshifts (zi + Zi+i)/2 > z > (zi + 2i-i)/2), where zi 
is the redshift corresponding to snapshot i. The physical proper- 
ties that these galaxies have are then also those they have in snap- 
shot i. Because the large-scale structure does not evolve rapidly 
between snapshots, it is safe to neglect any changes occurring in 
the distributions of galaxies. The physical properties of the galax- 
ies can fluctuate heavily between snapshots, but as long as one 
is interested in the evolution of the global population this can be 
safely ignored (Kitzbichler & White 20071) . However, in order to 
ensure that "observed" galaxy properties are correctly related to 
redshift we perform small interpolations of the observed-frame 
magnitudes, shifting each galaxy in both redshift and luminosity 
distance from the snapshot corresponding to redshift Zj to the red- 
shift at which it actually appear s on the lightcone telaizot et alj 
120051 : [Kitzbichler & White l2007h . In addition to this step, we make 
corrections to the observed magnitudes due to absorption by the 
IGM (see g23J. 

The final step required is to project the cone onto a virtual 
sky seen by a fictitious observer placed at the center of the celes- 
tial sphere (middle panel of Fig. [7}. It is now straightforward to 
assign WCS coordinate s (right ascension and decli nation) to every 
object in the cone (see Kitzbichler & White 2007). The projected 
large-scale structure can be seen in the sky distribution of galax- 
ies plotted in the right-hand panel of Fig. [7] Now that we know 
both the sky coordinates and the redshifts to every object along the 
lightcone, we can show the details of the large scale structure that 
would be probed in a deep pencil beam survey as it would appear 
in a large galaxy redshift survey out to z ~ 8. In Fig. [8] we plot 
the re dshifts of objects versu s their declination on the sky for one 
of the lHenriques et al.l J2012h lightcones. Points represent galaxies 
having 2/ -band magnitudes brighter than 26.5 (AB) mag. 



2.4.3 Getting the inclinations and position angles right 

One of the unique features and key science drivers of the MRObs 
is that it produces detailed predictions for the observed galaxy pop- 
ulation without having to make assumptions that are not supported 
or naturally accounted for by the model. The SAMs included in 
the MRObs allow us not only to predict morphologies and sizes 
of galaxies, but also their inclinations and position angles as seen 
by a virtual observer. The latter are derived from the orientation of 
the galaxy as defined by the angular momentum vector of its stel- 
lar disk. The SAM that we use here tracks the change in the total 
angular momentum vector of both gas and stellar disks. New gas 
condensing within a halo is assumed to carry the specific angular 
momentum of that halo. The total angular momentum change of 
gas disks in each time step is the sum of the change in angular 
momentum due to gas condensation, gas accretion and gas that is 
transformed into stars. The change in total angular momentum of 
stellar disks is given by the change in angular momentum due to 
gas that gets transformed into stars in each time step. 

As a consequence, the SAM predicts not only the spatial po- 
sitions but also the orientations of all galaxies with respect to the 
three-dimensional, co-moving, Cartesian coordinate system of the 
simulation box. From this we can then calculate the observed in- 
clinations and position angles of each galaxy based on the angles 
between our lightcones and the simulation box. Our method en- 
sures that the orientations of galaxies in the MRObs are, at the very 
least, physically motivated. This allows one to study in detail if the 
MR predicts any observable correlations between the orientations 
of galaxies, their parent halos or the large-scale structure. Such 
models are also suited for, e.g., conducting completeness tests as a 
function of inclination, for testing galaxy structure decomposition 
codes, and for paving the way for more elaborate, orientation-based 
dust screening models that may be implemented into the SAM at a 
later stage. 



2.4.2 Aiming at a Specific Object 

We have made a small modification to the MOMAF code that al- 
lows for the construction of lightcones not only in (semi-)arbitrary 
directions as described above, but to also 'aim' a lightcone such that 
it crosses through a specific point of the MR box at a specific co- 



2.5 IGM Absorption models 

The spectra of galaxies short- ward of 121 6A in the rest-frame are 
primarily affected by photoelectric absorption by the neutral hydro- 
gen associated with damped Lya absorbers (DLAs), Lyman Limit 
Systems (LLSs), optically thin systems, and resonance line scat- 
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tering by the Lya forest along the line of sighQ This absorp- 
tion affects the magnitudes and colours of galaxies observed in 
bands corresponding to these rest wavelengths. The strength and 
shape of this so-called "Lyman Break" depends mainly on the red- 
shift of the source, and the distributions in redshifts and optical 
depths of the intervening absorbers. In order to ensure that these 
effects are properly accounted for in the MRObs lightcones, at 
least in a statistical manner, we have implemented three differ- 
ent models for the IGM absorption. We include t wo model s based 
on the recent IG M transmi ssion calculations by Meiksin (2006, 
"MEIKSIN") and llnoue&Iwatal(l2008l . "IN OUE-IWATA") that are 
conveniently made available in code form bv lHarrison et al.l j201lL 
IGMtra nsmission). We also include the IGM transmission 
model of lMadaul i 19951 . "MADAU") that is still the most widely 
used in the literature today even though it has been shown to sig- 
nificantly over-predict the absorptio n in the 912-1216A range com- 
pared to the updated mod els (e.g. lBershadv~et ld1 ll999l : lMdkslnl 
l2006l : llnoue & Iwatal[2008b . Because the inclusion of the IGM at- 
tenuation is so important for creating realistic mock catalogs and 
images in the MRObs, here we will give a brief review of the mod- 
eling recipes. 

The MEIKSIN and INOUE-IWATA models are both based on 
a Monte Carlo approach that distributes LLSs chosen from a red- 
shift distribution dN/ dz and an optical depth distribution dN / dr 
(both constrained by observations), and averages over the IGM 
transmission measured along a large number of random lines of 
sight. The IGM effective optical depth r e at observed wavelength A 
is taken to be the sum of the optical depth due to LLSs, the optically 
thin IGM and the hya forest as follows: 



r e (A) = t££ s (\) + r!g M (A) +5>(A). 
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Figure 10. Attenuation of the UV continuum short-ward of Lycn due to 
neutral hydrogen along the line of sight affects the colors of high redshift 
galaxies. Panels show the average tr ansmission of the IGM according to 
the analytic approximation given by|Madau (1995, blue dashed curves), 
and the more recent M onte Carlo modeling techniques of lMeiksinl ( 2006, 
black solid curves) and llnoue & Iwatal 12008L red dotted curves) for exam- 
ple galaxies at z = 1 (top panel), z = 3 (middle panel), and z = 5 (bottom 
panel). The MRObs offers the choice between the different IGM implemen- 
tations. 



The optical depth due to photoelectric absorption by LLSs 
along the line of sight to a source at redshift z is given by 
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with zl — A/912 — 1, and d ® the number of absorbers per 
unit redshift and optical depth. In the MEIKSIN model the LLSs 
are randomly drawn from the distributions 
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400, 7i = 0.2, 72 = 2.5, 73 = 4, zi 
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1.2, and 



3 In our model approximations we neglect the much smaller contribution 
from intergalactic metals and He absorption. 



Z2 = 4. The mean IGM transmission due to the LLSs typically sta- 
bili zes after averag ing over ^10.000 random lines of sight. Follow- 
ing lMeiksinl (I2006T) and lHarrison et al.1 (l201lh . both models include 
a static contribution from the diffuse or optically thin IGM and the 
Lya forest: 



r£g M (A) 

fn(A) 



0.07553(1 + z L ) AA 
- In (exp (-T n (A))), 



1 



(1+Z L ) 3 / 2 (I + Z) 3 / 2 



V,) 

(8) 



where the Lyman transitions n —> 1 up to n = 31 are included. 

For co mpleteness, w e also give the MADAU modeling ap- 
proximation (Madau 19951 ): 
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with Aj = (0.0036,0.0017,0.0012,0.00093) for X 3 = 
(1216, 1016, 973, 950A), x c = 1 + z c , x em = 1 + z em , z c = 
X/Xl — 1, and z em is the reds hift of the sou rce. Eq.[l0]is the ap- 
proximation given for Eq. 16 in lMadauHl995l see footnote 3 in that 
paper). 

For an intrinsic galaxy spectrum fx , the attenuated spectrum 
observed will be of the form f\, e = fx • e~ Te ^ x \ where the 
effective optical depth of the IGM transmission function can be 
taken from any of the three IGM models. In Fig. [I0]we show the 
mean transmissions for sources at different redshifts. The MADAU 
model implies significantly less transmission than the other models. 
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In the MRObs we compute the mean absorption in each filter as- 
suming a 100 Myr old, continuously star-forming, solar metallicity 
template spectrurrQ modeled using Starburst99, and apply magni- 
tude corrections according to the redshifts of each simulated galaxy. 

Although all data produced by the MRObs include the IGM 
transmission model, we note that the magnitudes in the lightcone 
catalogs provided by MR database typically come without any IGM 
transmission corrections applied. We have stored our IGM correc- 
tions as a function of redshift and filter in the MR database, such 
that they can be conveniently applied to any of the MR lightcone 
catalogs that are available for download. 



2.6 Construction of the virtual telescope data 

2. 6. 1 Galaxy models 

Now that we have obtained all the necessary information pertain- 
ing to the positions, sizes, viewing angles, bulge-to-disk ratios, and 
IGM-corrected magnitudes across different filters, we can popu- 
late simulated images with galaxies. We follow a two-step process. 
First we simulate noise-free galaxy profiles projected onto a 2D 
image plane a t very high pi xel resolution using a modified versiorQ 
of Skymaker teertinll2009l) . We will refer to the result of this pro- 
cess as the "perfect" or "pre-observation" image. Once the perfect 
image has been made, it is straightforward to apply all the observa- 
tional effects such as the PSF, binning, sky background, and noise 
for any type of observation. This last step is done using our own 
custom cod^|. 

In line with the iGuo et all J201 lh semi- analytic predictions, 
galaxies in the MRObs are composed of an exponential profile for 
the disk (D) and a De Vaucouleurs profile (S) for the bulge (if any), 
each having a surface brightness profile fi(R) in mag arcsec -2 
given by: 

li S (R) = m- 2.51og 10 (£/T) + 8.3268(i?/i? e ) 1/4 

+ 5 log 10 (i* e ) -4.9384 (11) 

fi D (R) = m — 2.5 log 10 (l — B/T) 

+ im$7{R/R h ) + 51og 10 (i4) + 1-9955 (12) 

where m is the total magnitude (mag), B/T is the bulge-to-total 
ratio, R e is the bulge half-light radius (arcsec), and Rh the disk 
scale height (arcsec). Skymaker builds these profiles as elliptical 
shapes at pixel position x' = x — x c , y' = y — y c projected on 
the sky with position a ngle and inclination according to (see 
iBertin & Arnoutsll 19961) : 



for disks and bulges becomes: 



Cxx • x' 2 + Cyy • y' 2 + Cxy • x'y 



R z 



(13) 



such that the algorithm for calculating the projected light profiles 



4 The IGM attenuation depends somewhat on the intrinsic shape of the 
source spectrum. Although we could, in principle, apply different IGM cor- 
rections according to the range of spectral types found for galaxies in the 
MRObs, we neglect this here in order to speed up the computations as the 
lightcones typically contain millions of galaxies. 

5 We optimized Skymaker for dealing with very large input lists in .csv 
format provided by the MRDB, and for generating extremely large images. 

6 Although Skymaker was specifically designed to handle point spread 
function convolution, sky backgrounds, and simulating detector noise, for 
various practical reasons we do not currently make use of this functional- 
ity but use our own custom IDL and Python codes for these steps of the 
simulation. 



I D [x\y'] oc e 



-(Cxx-x'l+CYY-y'Z+CxY-y'x') 1 / 2 



I S [x\y'] OC e -^^C XX - X '- + C YY .y'^C XY .y' X ')^ {U) 

with 



Cxx 



(0) sin 2 (6>) 



+ 



A 2 B 2 



„ sin 2 (6>) cos 2 (<9) 

Cxy = 2cos(0)sin(0)Qj--ij 

A and B are the projected major and minor axes, with A = Rh for 
disks and A — R e for bulges, and B — Acos{(j)) with cos(0) the 
projected aspect ratio of the system. 

Our modified version of Skymaker performs this process effi- 
ciently for typical MRObs simulations that are based on lightcones 
containing several millions of galaxies per square degree. An ex- 
ample of the "perfect image" produced is shown in the left panel of 
Fig. [TT] where the white shapes indicate the simulated galaxy im- 
ages. The corresponding final ("noisy") telescope image produced 
following the process detailed below is shown in the middle panel. 

2.6.2 Input parameters (positions, magnitudes, inclinations, 
orientations, sizes, and bulge-to-disk ratios 

The center positions of all objects in the image plane are deter- 
mined from the right ascention and declination relative to the light- 
cone centres and the pixel scale of the desired instrument. Inclina- 
tions and position angles are uniquely determined from the angular 
momentum vector of the stellar disks relative to the orientation di- 
rection of the lightcones through the MR volume (see Figure [6]). 
Angular sizes are determined from the physical size and the diam- 
eter distance, Da, at the redshift of each source in the lightcone. 

We list the specific parameters required by Skymaker for sim- 
ulating each galaxy, and give a brief explanation of how this pa- 
rameter follows from our models. 

• x,y : The source position in image pixel coordinates. This po- 
sition is defined by the sky coordinates of a galaxy in the lightcones, 
the desired pixel scale of the image, the field of view, and the posi- 
tion of the image center relative to the lightcone center. 

• m : The total apparent (AB) magnitude of the source in the 
desired filter. This magnitude includes the attenuation by dust as 
well as the IGM absorption. 

• B/T : The bulge-to-total ratio of the source. This parameter, 
for which we take the ratio of the fluxes predicted for the bulge and 
total in each filter, is needed for assigning magnitudes to the bulge 
(mb = m — 2.5 \og 10 (B/T)) and disk {ma = m — 2.5 log 10 (l — 
B/T)) components. 

• Rh,disk : The scaleheight of the disk in arcseconds. 
This is defined by stellardiskradius/3DA, with 
stellardiskradius taken from the Guo2010a..MR 
tablcQ in the MRDB and is in units of kpc. 

• Re, bulge • The equivalent (or half-light) radius of the bulge 
measured in arcseconds. This is calculated as bulge size/ Da, 
where bu 1 ge s i z e is taken from the Guo 2 1 a . .MR table and 
is in units of kpc. 

7 The Guo2 010a . . MR table s t ores th e galaxy catalogue obtained by ap- 
plying the SAM from lGuo et all ( 1201 lh to the MR halo merger trees. 
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Figure 11. Single filter simulated image constructed from a lightcone. Left panels: The "perfect" image modeled using Skymaker. Galaxies consisting of 
disks and bulges are placed at the proper position, inclination, orientation, brightness and apparent size that are all uniquely determined by the semi-analytical 
model and the angles of intersection between the lightcone and the simulations volume. The only information that is not constrained by the model is the bulge 
shape, which we set to spherical. Middle panels: The perfect image as seen by our telescope simulator. Here we show a mock HST/WFC3 F160W .FITS 
image having the same detector properties, point spread function, sky background, and signal-to-noise as the ERS observations (a 1.'5 x 1.'5 region is shown 
at a spatial binning of 0"09 pixel -1 ). No stars were added to this observation. Right panels: The SExtractor "segmentation" image showing the locations and 
shapes of objects that were detected in the simulated image. Panels on the bottom row show a zoom of the full images shown in the top row. Although there is 
a good correspondence between objects seen in the simulated telescope image and objects detected by SExtractor, the perfect image that was used as the input 
for the image simulation contains many more sources that are too faint to be seen in the simulated image. 



• cos((j)disk) ' The projected aspect ratio of the disk that is 
uniquely determined by the angles of intersection of the lightcone 
with the MR volume and the intrinsic spin axis of the galaxy stellar 
disk. 

• Odisk ' The position angle of the disk, defined by the angles of 
intersection of the lightcone with the MR volume and the intrinsic 
spin axis of the galaxy stellar disk. 

• cosj^b uiae) • The projec ted aspect ratio of the bulge. Because 
all bulges in lGuo et aDd201 lh are spherical, we set this value to 1.0. 

• bui ae • The projecte d aspect ratio of the bulge. Because all 
bulges in lGuo et all fa) 111) are spherical, we set this value to 0.0. 

2.6.3 The Virtual Telescope Model (sky, PSF, noise, and all that) 

The MRObs produces realistic telescope data by applying an "ob- 
servation description" (OD) to the perfect image created in the pre- 
vious step. The OD consists of a set of instructions that completely 
defines a particular observation to be mimicked, e.g.: telescope, de- 
tector, filter, exposure time, number of sub-exposures, dither strat- 
egy, and sky conditions. Although the exact modeling method may 
vary depending on the details of a specific instrument or survey, 



here we list the basic observational effects typically being added in 
sequence: 

(1) The first step is to scale the perfect image populated by 
our bulge+disk surface brightness simulations to their proper fluxes 
measured in detector electrons by multiplying the models in Eq.[T4l 
by the factor 

Fe _ = 10 -o.4 ( ™ ab+ zp) .T^-G/^^lKy'j], (15) 

where ttiab is the AB magnitude of the disk/bulge, ZP is the zero- 
point in AB magnitudes that gives a detector count rate of 1 ADU 
s _1 , T exp the image exposure time in seconds, G is the detector 
gain in e— ADU -1 , and x • and y'j are the coordinates of pixel z, j 
belonging to each source. (2) We add a sky background. The value 
of the background is usually kept constant across the field (we use 
gnomonic projections) based on the average conditions at a partic- 
ular site or telescope, or is based on the sky background level mea- 
sured in a particular survey that is being modeled. (3) The image 
is convolved with a point spread function (PSF). The PSF can have 
various origins: it can be taken from a PSF simulator (e.g. Tiny- 
Tim in the case of HST), from (a stack of) stars extracted from a 
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fully reduced observation, or modeled with a simple function (e.g., 
a Gaussian). (4) The image is rebinned to the desired pixel scale. If 
the PSF is taken from an actual observation and is not available at 
sub-pixel resolution, the rebinning step is performed before the PSF 
convolution step. (5) Detector dark current is added to the image. 
(6) Poisson noise is calculated for each pixel value. (7) Gaussian- 
distributed readout-noise is added. (8) WCS astrometry is added 
to the image header based on the pixel scale and the astrometric 
system of the lightcone. (9) Scientific images in .FITS format are 
created, optionally with corresponding background and noise maps. 
Complex observations having the proper noise characteristics can 
be created from co-adds of multiple exposures made following the 
same above procedures 

The middle panels of Fig. Qj] show a mock HST/WFC3 i2i6o- 
band image corresponding to the perfect image shown on the left. 
The mock HST image was modele d after the i7i6p-band o bserva- 
tions of the GOODS ERS survey of IWindhorst et all (1201 ll) . 



2. 6. 4 Galactic Extinction and Stars 

Optionally, we apply Galactic foreground extinction to the input 
galaxy models by specifyin g the amount of red dening in units of 
E(B — V) and assuming the Carde lli et al attenuation curve 

with Rv — 3.1. If desired, Galactic stars can be added to the 
image, either based on a user- specified input distribu tion or based 
on an accurate Milky Way model (e.g., TRILEGAL; lGirardi et al.l 
l2005l) . 



2.7 Source Extractor 

With the synthetic images produced in the previous section, it is 
straightforward to analyze the data analogous to real observations. 
Sources in the images are detected by using th e Source Extractor 
(SExtractor) software teertin & Arnouts|[l996h . which efficiently 
decomposes a pixel image into 'objects' detected at some speci- 
fied threshold of flux above the image background. Photometry and 
other basic measurements are performed on all the detected objects 
yielding a source catalog corresponding to the image. The exact 
way in which objects are defined and how measurements are per- 
formed depend on the setting of various of the parameters in SEx- 
tractor, while the total number of objects that can be recovered from 
the image and the errors on their photometry largely depend on the 
image quality itself. The MRObs makes it convenient to test the 
different detection and photometry techniques available in the lit- 
erature, especially because the properties of the galaxies that were 
used to create the mock image are exactly known (as opposed to 
galaxies in real observations). 

We have run SExtractor on the mock HST/WFC3 ifi6o-band 
image shown in Fig. Qj] (middle panels). Panels on the right show 
the SExtractor "segmentation image", indicating all the objects that 
were detected in the mock image. While there is good correspon- 
dence between the two (nearly all objects seen in the mock im- 
age are also seen in the detection image), the perfect (input) im- 
age shown on the left contains many more galaxies, most of which 
are too faint to be detected in the mock observation. By cross- 
correlating the positions of detected objects listed in the SExtractor 



8 We do not currently include the effects of bleeding, blooming and sat- 
uration, but we note that the original version of Skymaker is capable of 
simulating these effects for those that are interested. 
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Figure 12. 'Trumpet' diagram showing the difference between the input 
magnitudes (from the MR lightcones) and the magnitudes measured by run- 
ning SExtractor on a simulated HST/WFC3 H-band image. 

output catalogs with the positions of objects in the underlying light- 
cone (both available in the MRDB), we can find out which of the 
semi-analytic galaxies (identified by their GALAXY ID) were de- 
tected in the image. This enables us to perform various diagnostic 
tests between measurements extracted from the synthetic observa- 
tions and the corresponding intrinsic physical properties from the 
lightcone. One such test is to study how well the real magnitudes 
are recovered from the synthetic images by SExtractor. In Fig. [12] 
we show a so-called 'trumpet' diagram indicating the difference in 
magnitude between the 'true' input value and the total magnitude 
given by SExtractor. The test shows, quantitatively, both how the 
amount of flux lost due to missed light, and how the photometric 
scatter due to increased noise increases toward fainter magnitudes. 

Because the cross-match between the SExtractor catalog and 
the lightcone catalog gives us the GALAXY ID of each galaxy in 
the images, this provides us also with a direct link to all the avail- 
able physical quantities in the semi-analytic snapshot catalogs, the 
dark matter halo catalogs, and the underlying dark matter density 
fields, such that it becomes possible to perform numerous experi- 
ments related to how well we can extract such physical parameters 
starting from any kind of observation that can be modeled using the 
MRObs. 



3 EXAMPLE: SIMULATING CANDELS DATA 

Large extra-galactic surveys often have complicated tiling patterns, 
exposure time variations, and masked regions across their total field 
of view that complicate the analysis. It can be convenient to include 
these kind of effects into the image simulation. This ensures that the 
signal-to-noise properties and the geometriy of the real and mock 
data sets are comparable. Here we will illustrate the technique that 
we use to accomplish this by performing a mock image simulation 
of the ongoing HST multi-cycle treasury program Cosmic Assem- 
bly Near-Infrared Deep Extragalactic Legacy Survey (CANDELS; 
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Figure 13. CANDELS UDS inverse variance weight maps. The total field 
of view measures 23 ; x 10 ; , and is constructed from 44 individual tiles ob- 
served with HSTAVFC3 in the filters F125W and F160W. 
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Figure 15. Simulated and real CANDELS UDS data in the filters F125W 
(top) and F160W (bottom). At a qualitative level the images appear already 
remarkably similar. Note that this is not only the result of our accurate im- 
age simulation technique, but also because our input galaxy population ap- 
parently has a striking resemblance compared with the observed one (e.g., 
in terms of number density, clustering, size and shape distributions, and 
brightness). Shown here is a region of about 2' x 2' extracted from the 
wider 23' x 10' UDS field. 



Figure 14. Images of the PSF in the filters F125W and F160W of the CAN- 
DELS UDS field. Our mock "perfect" images are convolved with these 
PSFs. 



iGrogin et al.ll201ll : lKoekemoer et al.ll201lL HST Programs 12060- 
12064, 12440; PI: S. M. Faber). 



3.1 The CANDELS observations 

Part of the ongoing HST CANDELS program, the UKIDDS Deep 
Survey (UDS) field measures approximately 23' x 10' in the filters 
F125W and F160W observed with the WFC3 on HST This field 
of view is covered with 44 individual pointings with HST/WFC3 
resulting in the tiling pattern shown in Fig. [13] For each tile, four 
exposures were obtained in both filters, resulting in average total 
exposure times across the field of 1900 s in F125W and 3300 s 
in F160W. The data were combined onto a common output frame 
measuring about 22,000 x 10,000 pix els with a pixel scale of 0V06 
using the MULT IDRI Z ZLE software jFruchter et al.l2009h . The re- 
sulting PSF in the drizzled data measures 07 12 (F125W) and 07 18 
(F160W) in FWHM (Tig.[l4t. How well can we simulate these kind 
of data based on cosmological simulations using the MRObs? 



3.2 The CANDELS simulation 

Using the procedures outlined in fl2.6l we can produce highly accu- 
rate mock "CANDELS" data in a number of complementary ways. 

(1) The first and most cumbersome method would be to pro- 
duce each individual CANDELS tile at the correct telescope posi- 
tion and roll angle, and then to process the entire data set through 
MULT IDRI Z ZLE analogous to the processing performed on the 
real data. While this is certainly possible, for many scientific appli- 
cations a good match between the simulated and real data sets can 
already be obtained by side-stepping the laborious drizzling pro- 
cess. 

(2) The simplest and most straightforward way is to directly 
generate mock images the size of the entire UDS field based on our 
model for the HST/WFC3 camera, the main UDS survey parame- 
ters, and a mock lightcone as input. This method produces mock 
UDS images for which the properties (e.g., noise, resolution) are, 
on average, very similar to those of the real survey. This is an ex- 
tremely fast method for generating mock data sets that are approxi- 
mately similar to the observations that are being modeled. It is also 
a powerful method to simulate images for a survey that has not (yet) 
been performed, or for simulating a survey at an arbitrary depth or 
field size. 

(3) Our third method, the one that we will use for our demon- 
stration, is an extremely powerful technique for generating a more 
precise simulation in which the pixel-to-pixel noise variations and 
geometry of the simulated images can be exactly matched to those 
of the real data. For this method we make use of "weight maps" as- 
sociated with the science data for many surveys. The CANDELS 
UDS weight maps (shown in Fig. [T3l > record the inverse vari- 
ance of each pixel calculated during the image reduction process 
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Figure 16. The background noise versus aperture size in the real CAN- 
DELS images (red solid lines) and our mock CANDELS images (blue 
dashed lines). The noise in the mock and real data is nearly identical. The 
true image noise in the absence of correlated noise introduced by the driz- 
zling process is somewhat higher (blue dotted lines). 



(iKoekemoer et alJ201lh . The HST inverse variance images are usu- 
ally calculated as follows 



Inverse Variance 



(ft) 2 



(D + fB) + ol 



(16) 



where / is the inverse flat-field, t is the exposure time, D is the 
accumulated dark current, B is the accumulated bac kground, and 
Gron is the read-out noise dKoekemoer et al.l 1201 ll) . The weight 
map includes all sources of instrumental and background noise, but 
not that of the science objects themselves to allow proper photom- 
etry with tools like SExtractor. As a first step we therefore produce 
simulated images that include the PSF-convolved objects (includ- 
ing the Poissonian object noise) but not the simulated background 
and read-noise we would normally apply. Instead, we add in these 
sources of noise by directly taking them from the inverse variance 
maps. As a final step we need to take into account that in the real 
CANDELS images the noise is spuriously correlated as a result of 
the drizzling process used to combine the many individual expo- 
sures. The amount of noise correlation depends on the multidrizzle 
parameters, which for the CANDEL S UPS data amounts t o a pixel 
rms noise reduction of a factor of 2 dCasertano et aljEoOOb . We in- 
troduce this noise correlation in our mock images by smoothing 
the mock images with a small Gaussian kernel (of about 1.5 pixels 
FWHM, in this case). 

In Fig. [15] we show a portion of the final simulated CANDELS 
images in F125W and F160W (left panels). In the panels on the 
right, we show a region of the real CANDELS UDS images, dis- 
played at the same zoom level and at the same color stretch as the 
mock images shown on the left. At a qualitative level the images 
are remarkably similar. Note that this is not only the result of our 
accurate image simulation technique, but also because our input 
galaxy population apparently has a striking resemblance to the ob- 
served one (e.g., in terms of number density, clustering, size and 
shape distributions, and brightness). However, before we can com- 
pare the galaxy populations in the simulated and the real data, we 
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Figure 17. The isophotal S/N versus the total magnitude as measured by 
SExtractor in the mock and the real CANDELS images are nearly identical. 



need to ensure that the image properties of our simulated data are 
indeed quantitatively similar to the real data. In Fig. [16] we show 
the measured background noise fluctuations as a function of aper- 
ture diameter as measured in the real CANDELS images (red solid 
lines) versus that measured in our simulated data set. The blue dot- 
ted lines indicate the (true) noise level in the absence of correlated 
noise. When we introduce the correlated noise resulting from the 
drizzling process, we get a near perfect match between the simu- 
lated (blue dashed lines) and real (red solid lines) CANDELS UDS 
images. As a second test, we look at the distribution of signal-to- 
noise (S/N) for objects detected in the real and simulated images. 
We ran SExtractor using identical detection parameters on the real 
and simulated images, and plot the isophotal S/N versus the mea- 
sured magnitudes. The result is shown in Fig.[T7]for the mock data 
(left panels) and the real data (right panels). Again, the S/N distri- 
butions are very similar between the real and simulated data, indi- 
cating that our image simulations are accurate. 

In £14.11 we show an application of these CANDELS simula- 
tions by comparing the galaxy number counts in our semi- analytic 
mock lightcones with those extracted from our mock images, and 
with those in the real CANDELS images. The simulated CAN- 
DELS data produced here are part of our first scientific data release 
as announced In 35] 



4 EXAMPLES OF APPLICATIONS 

4.1 Galaxy Number Counts in Observations and Simulations 

One of the most basic tests that are used to test the accuracy of 
semi-analytic model predictions is to compare the number counts 
of galaxies observed as a function of apparent magnitude in some 
band with those predicted by a mock lightcone observation con- 
structed from the semi-analytic model as described in Section |24l 
However, as discussed in the introduction, these light-cones do not 
suffer from any of the observational effects afflicting real observa- 
tions. 

The MRObs approach to modeling discussed in sections |231 to 
12.71 allows us to make a much fairer comparison between observa- 
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Figure 18. Galaxy number counts as a function of magnitude in the lightcone (black solid line), the real CANDELS UDS data (red solid line), and those 
extracted from the simulated CANDELS UDS data (blue solid line). While the lightcone data is known to over-predict the observed number counts to some 
extent, the discrepancy between the observations and the model predictions is significantly reduced after folding the lightcones through the MRObs and 
performing object detection and photometry from the mock images as performed on the real images. The large difference between the real and simulated data 
at the bright end is due to Galactic stars that are absent in our simulations. 



tions and semi- analytic predictions. By simulating a mock survey 
matched to the real observations that one wants to compare with, 
and then running source extraction and photometry software on the 
mock and real images in identical manner, one can be sure that both 
data sets will be affected by any observational biases is much the 
same way. 

To compare number counts measured from a mock lightcone 
to those derived from our mock images based on that lightcone, we 
plot the two in Fig.[l8] The figure shows counts from our simulated 
CANDELS UDS data in blue compared to the plain light cone data 
in black. The counts extracted from the simulated image were not 
corrected for completeness. At bright magnitudes (J, H < 22 mag) 
the counts are in good agreement, but they diverge toward fainter 
magnitudes counts detected in the images compared to the light- 
cone on which the mock images are based. The extracted counts are 
about a factor of 2 lower than the lightcone counts at J, if ~ 26.5 
mag. 

The red lines in the figure show number counts measured in 
the real CANDELS UDS data (no completeness corrections ap- 
plied). At the faint end, the lightcone substantially over-predicts the 
observed counts, similar to discrepancies between semi-analytic 
predictions and observations found in earlier studies. However, it 
is very interesting to note that the difference between the semi- 
analytic predictions and the real number counts becomes smaller 
when we compare the real data to our mock data. Simply by 'ob- 
serving' the lightcone we already lose a significant number of 
galaxies that would not be detected in a real observation (if the 
lightcone was an accurate reflection of reality). 

The results presented in Fig. [18] suggest that it is important to 
take observational effects into account when comparing real data 
with simulations. These effects need to be quantified before one 
can change the parameters in a semi- analytic model to better match 
the observations. With the mock data produced by and published 
through the MRObs these tests can now be performed easily. A 
more detailed analysis of the number counts in synthetic observa- 
tions as predicted by the MRObs compared to those predicted by 



ordinary semi-analytical models will be presented in a follow-up 
paper. 



4.2 The Properties of Galaxy Clusters at low and high 
redshift 

Our new lightcone 'aiming' technique described in £12.4.21 offers 
an efficient way for predicting the detailed observational properties 
of, for example, galaxy clusters. Here we present mock SDSS and 
HST observations of a massive galaxy cluster at different redshifts 
and orientations. The cluster was selected from the roughly 3,000 
clusters in the MR, and has a total dark matter mass of ~ 7 x 
10 14 M© at z = 0. The selection was performed using the table of 
friends-of-friends groups in the Millennium Run Database. After 
finding FOF groups in the right mass range, a random selection was 
made of a cluster. That cluster was traced backwards in time using 
the table with halo merger trees. At desired redshifts the position of 
the cluster's main progenitor was returned. That position, together 
with a direction and using the comoving distance corresponding to 
the redshift, was used to define a light cone that had the cluster at 
its center and at exactly the correct redshift. This cone was then 
observed using a few different virtual telescope configurations. 

In the first example, we have produced mock SDSS images in 
g, r, i showing what this cluster would look like at redshifts from 
z = 0.02 to z = 0.21 (Fig. [19]). These mock data can be compared 
directly with real clusters found in the SDSS. It is clear from Fig. 
[19] that the study of galaxy clusters in the SDSS survey becomes 
challenging already at moderately high redshifts. As a second ex- 
ample, we therefore show a mock image of the same cluster, now 
seen at z = 0.4 and observed with HST/ ACS in the filters g,r,z 
(Fig. [20] left panel). In the right panel we show an actual HST im - 
age of the well-studied z = 0.4 cluster C10024 djee et al.ll2007h . 
Qualitatively speaking, this cluster resembles our simulated clus- 
ter quite well. Users will be able to use mock observations such as 
these to compare the properties of simulated and real clusters in a 
quantitative manner. 
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Figure 19. A high mass galaxy cluster as it would appear at redshifts of z « 0.02 (panel a), z 
survey. These cluster images are based on our lightcone aiming technique described in £12.4.21 



0.09 (panel b), and z w 0.21 (panel c) in an SDSS-type 
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Figure 20. Mock HST/ACS grz image of a mas sive galaxy cluster in the MR simulations seen at z = 0.4 (left) versus a real HST/ACS grz image of the 
galaxy cluster C10024 at z = 0.4 djee et al.ll2007l) (right). These cluster images are based on our lightcone aiming technique described in £12.4.21 



Our third example highlights another unique feature of our 
improved lightcone technique, which allows us to produce obser- 
vations of structures seen from different directions. Each light cone 
is created following the same principle as above, the only differ- 
ence being that the cluster is observed from different directions. 
In Fig.|2T]we show mock HST images in Viz of the same cluster 
shown before, but now at z ~ 1.07. Panels show the exact same 
cluster viewed from three different directions, with (proto-)cluster 
galaxies having logM* > 10M© marked with white circles. The 
large yellow circle marks the virial radius of the central halo. While 
the projected distribution of cluster galaxies appears roughly spher- 
ical in the first two orientations (left and middle panels), it is much 
more filamentary in the third orientation (right panel). The line- 
of-sight velocity dispersions in the three cases are 807, 704, and 
568 km s _1 . This example illustrates that projection effects are im- 
portant to take into account when studying the assembly of galaxy 
clusters, especially at high redshift where both the samples of clus- 
ters and the number of identified cluster galaxies are relatively 



small. The multi-wavelength nature of the MRObs data allows for 
the detailed testing, calibrating and tuning cluster detection algo- 
rithms using physically-motivated cluster samples. 



4.3 Colors and Structural Properties of Galaxies 

Another new test facilitated by the MRObs is comparison of the 
structural properties of galaxies in the semi-analytic model to those 
in real observations. In Fig. [22] we show a stellar mass versus SFR 
diagram for galaxies between z = 1.5 and z = 2.5 selected from 
one of our mock lightcone catalogs. In the panel on the right, we 
show the actual postage stamp image of each galaxy indicating its 
appearance in mock HST data (the image stamps are drawn from 
our mock 9 filter color-composite image based on the HST/ERS 
survey). These mock data can be used to measure galaxy structural 
properties (e.g., Sersic index, bulge-to-disk ratio, inclination), sizes 
and colors in exactly the same way as typically performed on real 
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Figure 21. Mock HST/ACS Viz color composite image of a massive galaxy cluster at z « 1.07 viewed from three different directions. While the projected 
distribution of cluster galaxies appears spherical in the first two orientations (left and middle panels), it appears highly filamentary in the third orientation 
(right panel) indicating that projection effects can be important. The virial radius of the central halo is marked by a yellow square. 




Figure 22. For the first time, we have visualized the structural properties of galaxies in the semi-analytic models. In the left panel we show the stellar mass 
versus SFR diagram for galaxies between z = 1.5 and z = 2.5 made directly from the lightcone catalog. In the panel on the right, we show the actual postage 
stamp image of each galaxy indicating its appearance in mock HST data (the image stamps are drawn from our mock 9 filter color-composite image based 
on the HST/ERS survey). Quiescent objects that lie below the main star-forming sequence appear both redder and more compact compared to objects on the 
star-formation sequence. The colors, sizes and structural properties of these galaxy images can now be directly compared to similar galaxies in real data for a 
more accurate comparison. 



data only. By comparing measurements made based on the mock 
images with the exact physical quantities given by the semi-analytic 
model users can test how well such values can be recovered for a 
given data set, or for a given galaxy population. It also allows users 
to compare quantitatively and directly the structural properties of 
mock and real galaxies in a relatively unbiased way. 



4.4 Selection of High Redshift Dropout Galaxies 

The last example we show here is the use of the MRObs in 
the selection of high redshift dropout galaxies from deep multi- 
wavelength imaging surveys. In Fig. [23] we show the color-color 
diagrams typically used to isolate galaxy samples at z ~ 4 (B- 
dropouts), z ~ 5 (V-dropouts), and z ~ 6 (i-dropouts). Objects at 
these high redshift suffer severe attenuation from the IGM in their 
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Figure 23. Color-color diagrams commonly used to select galaxy samples at z ~ 4 (£>-dropouts, left panels), z ~ 5 (V-dropouts, middle panels)), and z ~ 6 
(i-dropouts, right panels). Panels on the top show the color distributions of all objects in the lightcone. Panels on the bottom show the color distribution of the 
objects detected in mock images based on the same lightcone. Tracks indicate the typical colors of simple galaxy templates for various low redshift populations 
(red lines; irregulars: solid, Sbc: dotted, Elliptical: dashed) and high redshift dropouts (blue, with redshifts marked along the tracks). Shaded regions mark the 
color-color selection windows commonly used to select high redshift dropout candidates. 



spectra blue-ward of hya (see Sect. 12.5b . Consequently, these ob- 
jects can be isolated from lower redshift galaxy population, as their 
Lyman break redshifts through a strategically chosen set of filters. 
Panels on the top show color distributions for all objects found in 
one of our mock lightcones. The bottom panels show those objects 
that were detected in a mock survey based on the same lightcone. 
The limiting magnitudes used for the lightcone and for the extracted 
catalog were the same. This figure highlights some of the main dif- 
ferences between a pure semi-analytic model prediction (top pan- 
els) and what an observer actually sees (bottom panels). The colors 
of galaxies extracted from mock images are significantly scattered 
compared to their true (input) colors, making it harder to distin- 
guish between low and high redshifts, or to derive their physical 
properties (e.g., redshift, mass, dust, star formation history, SFR) 
based on fitting their observed colors to a set of spectral synthe- 
sis models. It is straightforward to study and quantify such effects 
through the use of these kind of mock data. In the MRDB SQL 
queries can be performed to cross-match the SExtractor output cat- 
alogs to the lightcone or semi-analytic input catalogs, allowing one 
to investigate in detail the offsets between intrinsic and apparent 
properties, and to study which galaxies are included and excluded 
by certain observational selection criteria (e.g. color-color selec- 
tions). 



5 PUBLIC ACCESS TO THE MROBS DATA 
5.1 MRObs database 

As described above, the MRObs builds upon and extends the pop- 
ular Millennium Run Database (MRDB). Apart from the images, 
all the datasets produced by the MRObs and described in this paper 
are stored in a database that is accessible through the same inter- 
face as the MRDB itselfl and can be directly joined to the existing 
data sets. Here we give a summary description of the database and 
access methods, focusing on the new data products and how they 
are linked to the existing ones. 

The MRDB is a relational databas J^l where data sets are 
stored in tables (relations). A table generally stores objects of a 
particular type, with properties of these objects stored in columns. 
For example we have tables storing the the positions and veloci- 
ties of particles from an N-body simulation, albeit a small one. We 
have tables with FOF groups and sub-halos as well as galaxies and 
many more. The web site giving access to these tables provides all 
information about the structure of the database. 

An important feature of relational database design in general, 

9 See http://gavo.mpa-garching.mpg.de/Millennium for a publicly accessi- 
ble website giving access to the milli-Millennium database and information 
on how to gain access to the full database. 

10 This is not the place to describe relational databases in detail, there is 
sufficient information available online. 
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and the MRDB in particular, is the possibility to manifest rela- 
tions or links between objects in different tables. For example, a 
galaxy in the Munich semi-analytical model is always embedded 
in a subhalo. This relation is stored in the tables with galaxies as 
a column storing the (unique) identifier of the corresponding halo. 
The MRDB has a particularly rich set of such relations, especially 
where it deals wit h the relations between objects of th e same type 
at different times jLemson & Virgo Consortiumll2006l) . 

Recent additions to the databas e were the result s of the lat- 
est version of the Munich SAM from lGuo et al.l d201 lh and pencil- 
beam and all sky light-cones derived from these in Henriques et al.l 
b012h . The images produced by the MRObs from such light-cones 
do not lend themselves easily for storing in a database. However the 
SExtractor catalogues extracted from the images have been stored 
and we also have tables storing the different IGM absorption mod- 
els described in ^2.51 More information and examples on how to 
apply and cross-correlate the various MRDB and MRObs data sets 
are documented at the URL given below. 



5.2 Data products of the MRObs 

The MRObs delivers a number of entirely new data products to the 
community that are useful for independent analysis, or for serving 
as the starting point for new simulations. Here we will briefly de- 
scribe the different types of new products. 



5.2.4 Pre- observation maps ("perfect" model images) 



As described in ^2.61 for each filter we build a so-called pre- 
observation or "perfect" image that is based on the input object 
list. These images can be seen as a representation of the sky free 
of noise, PSF, or background. As such, they are easily convolved, 
rebinned, and scaled to match an arbitrary observation (typically a 
combination of a given telescope, camera, and exposure). 



5.2.5 Simulated images 

The "perfect" images are turned into synthetic images that simu- 
late real observational data. These images can be downloaded for 
further analysis. We also provide the PSF images that were used to 
convolve the perfect images to the instrument resolution, as well as 
documentation providing full details of how the images were pro- 
duced. 



5. 2. 6 SExtractor products 

The simulated images are processed using SExtractor to produce 
the so-called segmentation maps identifying which image pixels 
correspond to which detected object, as well as the standard SEx- 
tractor output photometry catalogs. The SExtractor catalogs are 
made available through the MRDB where they can be searched or 
cross-matched with other data, such as lightcone catalogs, semi- 
analytic snapshots, dark matter halos, or density fields. The seg- 
mentation images are available for download. 



5.2.1 Multi-wavelength lightcone catalogs with structural 
properties 

The random field lightcones released as part of this paper are identi- 
cal to the 24 multi wavelength l ightco nes measuring 1.4° x 1.4° on 
the sky from iHenriques et all d2012h , but with structural informa- 
tion added. The new structural information (sizes of the disk and 
bulge components, inclinations and position angles) is crucial for 
building the accurate galaxy models predicted by the MR simula- 
tions. These lightcones can be used, for example, to compare struc- 
tural properties measured off the simulated images to the true input 
values. They can also be used as the starting point for users wishing 
to perform their own image simulations using realistic input cata- 
logs based on the MR. In addition to the "random" lightcones, we 
also release entirely new lightcones that specifically target galaxy 
clusters at a range of redshifts (see ^2.4.2l and ^4.21 ). All these light- 
cone catalogs are made available through the MRDB. 



5.2.2 IGM tables 

We provide tables that list the mean IGM attenuation as a function 
of redshift for a range of models. The IGM tables are applied to the 
lightcones to predict accurate colors and magnitudes of galaxies 
with redshift. 



5.3 Simulated surveys currently available in the MRObs 

In its current deployment, the MRObs offers a number of data sets 
conveniently matched to some of the most popular extra-galactic 
surveys (e.g. the SDSS, CFHT-LS Wide and Deep, GOODS, UDF, 
GOODS/ERS, and CANDELS) for use by the community. Updates 
and future data releases will be announced through the MR web 
portal (URL given below), and in forthcoming publications. 



5.4 The MRObs Image Browser 

A special feature of the MRObs is that many of the data sets can 
also be accessed directly by means of our interactive MRObs im- 
age browser. This is an online tool that allows users to scan over 
and zoom into the synthetic images. These images are linked to 
the backend database (the MRDB) through a simple point-and- 
click function that allows retrieval of detailed information about 
the galaxies that are displayed. This is useful, for example, for fa- 
miliarizing oneself with the relation between physical and observed 
properties of different types of galaxies or galaxies at different red- 
shifts, for selecting interesting objects from the MR simulations 
for subsequent analysis, for comparing the quality expected for dif- 
ferent types of data sets or telescopes, and for didactical and out- 
reach purposes. Here we describe the main features of the MRObs 
browser in brief. 



5.2.3 Object lists 

Information from the structural light cones, the IGM tables, and 
a plate scale are combined to generate the input to the SkyMaker 
code that we use to create our synthetic "pre-observation" images. 
These object lists may be used by other synthetic image simulators. 



5.4.1 Deep zoom RGB image pyramids 

The images produced by the MRObs are typically very large. For 
example, a simulated HST survey covering an area of 30' x 30' at 
a (drizzled) pixel scale of Of! 09 already measures 20,000x20,000 
pixels (400 Megapixels), and in principle the MRObs could create 
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Figure 24. An image pyramid consisting of three levels is shown. At its 
lowest resolution level, the image consists of a single 256 x 256 pixels tile 
having l/4th of the true image resolution. At the base of the pyramid, the 
image is divided into 4x4 full-resolution tiles each measuring 256 x 256 
pixels. The MRObs deep zoom image browser makes heavy use of these 
kinds of tilings for an efficient viewing of the data. The browser also makes 
a translation between pixel coordinates and WCS coordinates within each 
viewport. This conveniently enables the user to retrieve high-level proper- 
ties of any object found within the image by matching its sky coordinates to 
the underlying lightcone catalog, and by querying the galaxy or halo cata- 
logs stored on the MR data base server based on the GALAXY I D or HALO I D 
of any matches found. 



much larger fields at much higher resolution than this. These im- 
ages therefore do not fit on a standard computer screen. Using a 
technology similar to e.g. Google Maps, the MRObs Browser al- 
lows users to efficiently pan around and zoom in such large, high 
resolution images. We here describe in some detail how we have 
implemented this truly virtual telescope. 

First the simulated, multi-wavelength filter images are com- 
bined into false-colour RGB composites. We use the publicly avail- 
able code STIFF0 that handles the conversion from arbitrarily 
large sc ientific FITS input images to standard TIFF format output 
images feertinll2012h . STIFF automatically (or manually) applies 
contrast and brightness adjustments, colour balance and saturation, 
and gamma corrections producing colour images that are highly in- 
formative of the level of detail present in the mono-chromatic input 
fits images. When we have multiple bands available for any of the 
three RGB channels (for example when making colour composites 
of data sets based on more than three filters), we reduce the num- 



1 1 http://astromatic.net/software/stiff 



ber of input images to three by creating variance- weighted averages 
and use those as the input for each channel. 

/From this high-resolution image we then create a so-called 
"image pyramid" consisting of representations of this high resolu- 
tion image at ever decreasing resolution. The method is illustrated 
in Fig. [24] The top of the pyramid (level 0) consists of a single 
s x s pixels low resolution image that is a heavily rebinned version 
of the original or full-resolution N x N pixels image. The next 
level contains p 1 x p 1 image tiles each of p times higher resolution 
compared to the previous level. At the nth level (corresponding to 
the base of the pyramid), there will be p n x p n tiles each represent- 
ing only a small portion of the original image but now at its highest 
resolution. 

The browser softwarj^l uses this data format to download 
only those tiles that at the current zoom level are required to show 
the image. This significantly reduces the download time and cre- 
ates a smooth transition between the different levels or different 
regions of the image when viewed in a web browser. For exam- 
ple, if we adopt a factor of p — 2 scalings between levels, tiles of 
s = 256 pixels, and an original image of N = 32, 768 pixels, the 
last level (level 7) will consist of 128 x 128 tiles of 256 x 256 pixels. 
This means that only about 0.1% of data needs to be downloaded 
at any time to display a particular region at its fullest resolution on 
a 1024 x 1280 resolution display. 

5.4.2 User interface of the MRObs Browser 

The MRObs Browser offers the user the choice of a large number 
of image pyramids, based on sets of different mock images, for a 
variety of virtual telescopes and with different wavelength bands. 
Each mock image is derived from a light cone stored in the MR 
Database, and the MRObs Browser allows interactive querying of 
these cones. 

Screen-shots of an HST simulation viewed through he MRObs 
Browser are shown in Fig. [25] Clicking the image leads to an SQL 
query being submitted to the database that searches for the near- 
est galaxy to the selected (virtual) sky position, up to a maximum 
radius (currently \"). If a galaxy is found a large amount of infor- 
mation is retrieved and displayed in a table on the screen next to the 
image, as shown on the right hand side of the screenshots in Fig. [25] 
The selected galaxy is indicated on the image with a little white 
square (top panel). The table includes information of the galaxy on 
the light cone, such as redshift, apparent sizes and luminosities in 
up to 40 bands. The observer-frame SED is shown in graphical form 
above the table. It also includes, through the linking of the light- 
cone galaxy to the underlying semi- analytical galaxy catalogues, 
information about physical parameters such as stellar mass, gas 
mass, metallicities and rest frame magnitudes in the SDSS bands. 
The information also includes details about the original dark-matter 
subhalo and friends-of-friends group the galaxy belongs to. The lat- 
ter information can in its turn be used to search for all other galax- 
ies in the image that belong to the same FOF group as the selected 
galaxy. In the bottom panel, the positions of all galaxies that were 
retrieved are indicated on the screen. The structure turns out to be 
a galaxy group at z « 0.5. 

The query capabilities of the MRObs Browser will be ex- 
tended over time and will be tied to the plain SQL query capabilities 
of the MRDB. 

12 We use the Deep Zoom technology developed by Seadragon/Microsoft 
embedded in custom written java script libraries. 
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Figure 25. Screenshots of the MRObs v0.9 image browser available online. Top panel: basic view of the browser showing a small region of our synthetic 
HST GOODS observation (the particular colour image pyramid shown here is composed of the filters V,i,z). Users can pan around and zoom the synthetic 
observation, and directly query the MRDB by clicking on a galaxy. Information about the selected object (marked by a white square) is retrieved from the 
MRDB, and displayed in the information panel on the right-hand side of the screen. The MRObs shows a broad-band spectrum of the object, as well as about 
one hundred attributes of this object retrieved from the MRDB (e.g., size, SFR, stellar mass, age, redshift, magnitudes, black hole mass and dark matter halo 
virial mass and radius). Bottom panel: One can highlight all galaxies belonging to the same FoF group as the selected galaxy. In this case, the selected galaxy 
is the central galaxy of a galaxy group at z « 0.5 (red squares: galaxies that are orphan (type 2) galaxies of the central halo; yellow squares: galaxies that are 
satellite (type 1) galaxies of the central halo; white square: the central (type 0) galaxy). 
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6 SUMMARY 

In order to make predictions in the observational plane and to allow 
unbiased comparisons between semi-analytic models and real data, 
we have developed the Millennium Run Observatory (MRObs), a 
new virtual telescope facility that can be used to simulate real- 
istic observational data based on the semi-analytic model galaxy 
catalogues associated with the dark matter Millennium Run Sim- 
ulations. The MRObs allows one to produce scientific image data 
sets in .FITS format. These artificial data can be analyzed using 
the standard tools routinely used for analyzing real observations, 
allowing a relatively unbiased comparison between SAMs and real 
data. This contrasts with previous studies that compare highly ide- 
alized SAM predictions to observational data. The new technique 
will help - but is by no means limited - to: 

• Extend the Millennium Run project approach by producing 
data products directly corresponding to observations, namely syn- 
thetic images and extracted source catalogs 

• Aid theorists in testing analytical models against observations 

• Aid observers in making detailed predictions for observations 
and better analyses of observational data 

• Allow the community to subject the models to new kinds of 
tests 

• Allow observers and theorists to work toward each other from 
either direction with the freedom of where to meet 

• Allow detailed comparisons with synthetic observations pro- 
duced by other groups performing cosmological simulations 

• Allow calibration of observational analysis methods by mak- 
ing available synthetic data for which the entire underlying physical 
"reality" is known. 

• Extend the realism with which semi-analytic models can ad- 
dress questions such as what is the probability that a z ~ 10 galaxy 
will be detected within a particular observational data set? 

• Provide a framework for future virtual theoretical observato- 
ries 

One of the great advantages provided by our extended model- 
ing approach is that for the synthetic observations produced by the 
MRObs, the physical properties (e.g., dark matter halo mass, SFR, 
stellar mass, size, redshift) and photometric properties (e.g., magni- 
tudes and colors) of every galaxy are precisely known, in contrast to 
real observations where one does not know the exact or "true" an- 
swer. This makes the MRObs an ideal facility for calibrating many 
of the measurement and analysis techniques that are applied to real 
observations. The MRObs will allow observers and theorists to ap- 
proach a problem from different directions with freedom in decid- 
ing where to meet. 

We have introduced a modified lightcone technique that allows 
us to create lightcones aimed at selected objects or regions placed at 
any desired position or orientation. The new technique is useful for 
extending the range of questions that can be asked of the MRObs, 
such as what would be the appearance of a particular galaxy cluster 
at z ~ 1? What does this same cluster look like at z ~ 6 or at 
z = 0? How is the interpretation of observations of such structures 
affected by viewing angle or chance superpositions? Special cones 
centred on clusters at a range of redshifts have been added to the 
MRDB for studies of cluster evolution. 

Attenuation by the IGM is appli ed to th e light cones statis- 
tically using the baseline model from Madau (1995), as well as 
two more rece nt implementations ba sed on Monte Carlo modeling 
of the IGM bv lHarrison et all feOllh . Our IGM attenuation tables 
have also been added to the MRDB such that they can be used to 



apply "on the fly" IGM absorption corrections to lightcones (see 
the MRObs URL for a tutorial on ho w to apply the IGM ab sorp- 
tion corrections to the lightcones from lHenriques et all J2012I) also 
available in the MRDB). This is essential for making comparisons 
with high redshift observations. 

In order to allow the community to use our predictions as the 
basis for other mock observation experiments, we provide not only 
our final image products, but also the intermediate steps such as the 
input object lists and the pre-observation model images. 

In order to introduce the communities of theoretically and 
observationally inclined researchers to the "added-value" of the 
MRObs modeling approach, we have provided the following four 
example user cases: 

(1) We compared the galaxy number counts in the CAN- 
DELS/UDS survey with the predicted counts taken directly from 
the semi- analytic lightcone and with the counts extracted from 
synthetic CANDELS images (Fig. [18] and jjjl) . Interestingly, 
the counts recovered from the synthetic images are lower than 
those predicted by the lightcone that was used to construct the 
synthetic observation, suggesting that the discrepancy between 
semi-analytic model predictions and observations may be smaller 
than previously claimed. The implications of this will be detailed 
in a followup paper (Overzier et al., in prep.). 

(2) We simulated images of galaxy clusters seen with SDSS 
and HST at a range of redshifts (Figs. [TAHITI and jgSj . We also 
showed synthetic images of the same galaxy cluster at z = 1.1 
from three different directions, illustrating that orientation effects 
can be important when interpreting the visual appearance of 
large-scale structure at high redshift. 

(3) We showed how the MRObs allows one to study the 
detailed structural properties of semi-analytic galaxies in synthetic 
images (Fig. [22] and 34.3) . In these synthetic images one can 
measure colors, sizes, bulge-to-disk ratios and profile shapes using 
standard observational techniques. The outcome of these measure- 
ments can then be compared to the intrinsic values provided by the 
MRObs, or to measurements performed on real galaxies. 

(4) We showed how the MRObs images can be used to search 
for high redshift dropout galaxies in a manner that is directly 
analogous to that used for real high redshift doprout searches (Fig. 
[23] and 34.41 ). This enables a much more realistic comparison with 
the data, and allows us to assess how well we are able to derive the 
intrinsic physical properties from the observations. 

Extending the successful open-access approach of the MR 
project, we make available new data products for use by the com- 
munity. As part of our first data release, we have produced simu- 
lated data that emulates a number of key surveys, including SDSS, 
CFHT-LS (Wide and Deep), GOODS, HUDF, GOODS/ERS, and 
CANDELS (UDS, COSMOS and GOODS-S). The data sets are 
modeled using two different cosmologies (WMAP1 and WMAP7), 
two spectral synthesis models (BC03 and M05), and three IGM 
absorption models (MADAU, MEIKSIN, and INOUE-IWATA). In 
specific cases, we provide synthetic images that have the exact 
same geometric and identical noise properties as the reference sur- 
vey. The MRObs data can furthermore be explored using an on- 
line image browser that allows users to interactively explore the 
available mock observations. The browser graphically links objects 
(galaxies) in the synthetic images to various types of information 
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available in catalogs in the MRDB. For each synthetic galaxy, this 
information includes the physical properties of their dark matter 
halos, the intrinsic properties of the galaxy itself, the absolute and 
apparent photometric properties, and the observed properties recov- 
ered from the synthetic images using SExtractor. 

The public data and the MRObs browser can be accessed at 
the following URL: 

http : / /galf ormod . mpa-garching . mpg . de/mrobs/ 

In conclusion, the MRObs allows us to study our simulated 
universes through the eyes of our telescopes. We hope that the 
methods and data presented in this paper will encourage others to 
take advantage of the new opportunities offered by this approach. 
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Table 1. Filters currently available in the MRObs 



FILTERNAME_DB 


Description 


FILTERNAMEJDB 


Description 


U 


Johnson U 


ACS435 


HST/ACS-WFC F435W 


B 


Johnson B 


ACS475 


HST/ACS-WFC F475W 


V 


Johnson V 


ACS606 


HST/ACS-WFC F606W 


Rc 


Cousins R 


ACS625 


HST/ACS-WFC F625W 


Ic 


Cousins I 


ACS775 


HST/ACS-WFC F775W 


Z 


UKIDDS Z 


ACS814 


HST/ACS-WFC F814W 


Y 


UKIDDS Y 


ACS850 


HST/ACS-WFC F850LP 


J 


UKIDDS J 


GFUV 


GALEX FUV 


H 


UKIDDS H 


GNUV 


GALEX NUV 


K 


Johnson K. 


NIC110 


HST/NICMOS F110W 


Ks 


UKIDDS K s 


NIC160 


HST/NICMOS F160W 


il 


Spitzer/IRAC channel 1 


VIMOSU 


VLT/VIMOS U 


i2 


Spitzer/IRAC channel 2 


WFC105 


HST/WFC3-IR F105W 


i3 


Spitzer/IRAC channel 3 


WFC125 


HST/WFC3-IR F125W 


i4 


Spitzer/IRAC channel 4 


WFC160 


HST/WFC3-IR F160W 


SDSS.u 


SDSS u' 


WFC225 


HST/WFC3-UVIS F225W 


SDSS_g 


SDSS g' 


WFC275 


HST/WFC3-UVIS F275W 


SDSS_r 


SDSS r' 


WFC336 


HST/WFC3-UVIS F336W 


SDSSJ 


SDSS i' 


WFPC300 


HST/WFPC2 F300W 


SDSS_z 


SDSS z' 


WFPC450 


HST/WFPC2 F450W 
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